An interpretable generative multimodal neuroimaging-genomics framework for decoding Alzheimer’s disease

Alzheimer’s disease (AD) is the most prevalent form of dementia, affecting millions worldwide with a progressive decline in cognitive abilities. The AD continuum encompasses a prodormal stage known as Mild Cognitive Impairment (MCI), where patients may either progress to AD (MCIc) or remain stable (MCInc). Understanding the underlying mechanisms of AD requires complementary analysis derived from different data sources, leading to the development of multimodal deep learning models. In this study, we leveraged structural and functional Magnetic Resonance Imaging (sMRI/fMRI) to investigate the disease-induced grey matter and functional network connectivity changes. Moreover, considering AD’s strong genetic component, we introduce Single Nucleotide Polymorphisms (SNPs) as a third channel. Given such diverse inputs, missing one or more modalities is a typical concern of multimodal methods. We hence propose a novel deep learning based classification framework where generative module employing Cycle Generative Adversarial Networks (cGAN) was adopted to impute missing data within the latent space. Additionally, we adopted an Explainable Artificial Intelligence (XAI) method, Integrated Gradients (IG), to extract input features relevance, enhancing our understanding of the learned representations. Two critical tasks were addressed: AD detection and MCI conversion prediction. Experimental results showed that our framework was able to reach the state-of-the-art in the classification of CN vs AD reaching an average test accuracy of 0.926 ± 0.02. For the MCInc vs MCIc task, we achieved an average prediction accuracy of 0.711 ± 0.01 using the pre-trained model for CN and AD. The interpretability analysis revealed that the classification performance was led by significant grey matter modulations in cortical and subcortical brain areas well known for their association with AD. Moreover, impairments in sensory-motor and visual resting state network connectivity along the disease continuum, as well as mutations in SNPs defining biological processes linked to amyloid-beta and cholesterol formation clearance and regulation, were identified as contributors to the achieved performance. Overall, our integrative deep learning approach shows promise for AD detection and MCI prediction, while shading light on important biological insights.


Introduction
Alzheimer's disease (AD) is a chronic neurodegenerative disorder that affects millions of people worldwide (approximately 30 million in 2015 [1].It is the most common cause of cognitive impairment, gradually impacting the activities of a patient's daily life.It is characterized by the progressive loss of cognitive and functional abilities, including memory, language, and executive functions [2], with a temporal progression.Amyloid accumulation represents the first event, followed by tau accumulation, hypometabolism (assessed with Positron Emission Tomography (PET)), atrophy, and cognitive decline [3,4].It is hence evident that the pathology changes of AD actually begin several years before the first clinical symptoms.Therefore, a timely AD diagnosis is highly beneficial for optimizing patient care and enabling appropriate therapeutic interventions.Mild cognitive impairment (MCI) is the intermediate stage from normal cognitive function to AD, hence representing an opportunity for an early targeting of the disease.However, it includes a very heterogeneous class of patients, including subjects that will likely convert to AD, known as MCI converters (MCIc), with an estimated annual conversion rate around the 16.5% [5], and subjects that remain stable after several years, being part of the MCI non-converters (MCInc) group [6].
Among the available neuroimaging technologies, structural Magnetic Resonance Imaging (sMRI) and resting-state functional MRI (rs-fMRI) have provided unprecedented opportunities for deriving biomarkers allowing the early diagnosis of AD.For instance, sMRI is currently a key part of the diagnostic criteria for the differential diagnosis and longitudinal monitoring of patients with dementia, enabling the estimation of brain atrophy.Several studies have consistently observed both global and local atrophic changes in AD, lying along the hippocampal pathway (entorhinal cortex, hippocampus, parahippocampal gyrus, and posterior cingulate cortex) in the early stages of the disease, while atrophy in temporal, parietal and frontal neocortices emerges at later stages being associated with neuronal loss as well as with language, visuospatial and behavioral impairments [7,8].Rs-fMRI, in turn, indirectly measures the neural activity by detecting changes in the Blood Oxygenation Level Dependent (BOLD) signals, which depend on the neurovascular coupling [9].In particular, investigating functional connectivity (FC; inter-regional coupling), functional network connectivity (FNC; inter-network coupling), and functional networks from BOLD rs-fMRI provides a means for understanding the mechanisms and relevance of the functional relationships across brain regions.A growing body of rs-fMRI studies suggests that failure of specific resting-state networks (RSNs) is closely related to AD, with the default-mode network (DM) and the salience network (SN) playing a pivotal role and being disrupted before clinically evident symptoms [10].Specific alterations in these functional networks have been reported in AD patients, with prominent FC decreased within the DM and increased FC in the SN [11,12].Moreover, disconnections within and between the different RSNs have been consistently demonstrated, particularly over long connection distances [13].All of these factors have contributed to the widespread view of AD as a disconnection syndrome being characterized by a cascading network failure, beginning in the posterior DM and then shifting to other systems containing prominent connectivity hubs, possibly associated with amyloid accumulation [14].Moreover, increased evidence supports the view that tau depositions are also related to functional brain architecture and FC changes, supporting the view of transneuronal tau propagation in AD [15].Moreover, AD also features a strong genetic component.Strategies to extract linked genotype traits are commonly based on Single Nucleotide Polymorphism (SNPs) analysis [16], representing variations of single nucleotides in DNA sequences that vary from person to person [16] and are present in at least 1% of the population.Several Genome-Wide Association Studies (GWAS) have identified more than 40 AD-associated genes/loci, which are likely to increase the risk of developing the disease [17,18,19].Among them the apolipoprotein E (APOE) gene, in particular the ϵ4 allele, PICALM, CLU, ABCA7, and CR1 are the most important genes being associated with AD risk factor or the progression from MCI to AD [20], [21], [22], [23], [24], [25], [26,27].This great heterogeneity of available biomarkers offers a unique opportunity to explore various aspects of the disease continuum.Each biomarker provides valuable information about specific characteristics, enabling a multifaceted investigation of AD which calls for methods that can effectively integrate and leverage complementary information.In this regard, artificial intelligence, particularly Deep Learning (DL), emerges as a promising technology to tackle this complex task.By employing multiple layers of processing, DL models allow extracting progressively high-level and more informative features from input data.Moreover, the inclusion of multiple data sources into a single model can uncover complex and deep non-linear associations between the input features from a multimodal perspective [28].As a result, multimodality is gaining significant popularity representing the key approach to gain valuable insights into complex and multifaceted neurodegenerative diseases such as AD [29], [30], [31].However, despite the promising foreseen of such an approach, multiple drawbacks are present and represent the focus of the current research.The main concern resides in missing data management.Particularly in the biomedical domain, it is very common to incur in missing values or acquisitions for certain subjects or entire study cohorts due to different reasons, such as missing acquisition, corrupted data, and patients dropout from a study [32], privacy and expensive tests.Additionally, the interpretability of a model's predictions is a central characteristic of decision-aiding models, which is still not pervasively addressed.In the past years, many high-performing prediction models have been proposed lacking a clear rationale to be effectively considered for practical use.Strong and validated explanations associated with a given prediction are fundamental for increasing trust in the results as well as their applicability in a real-world scenario.Innovative and viable missing modality management as well as interpretability are the key attributes that a multimodal model for diagnosis detection should have.
Diving into missing data management, the simplest and most common solutions consist either of discarding samples with missing modalities, or filling the missing values with zeros [33], and computing imputations based on data interpolation [34].Such solutions are evidently suboptimal since they could significantly reduce the number of training samples, already small when addressing biomedical-related tasks, or introduce important biases in the data and the model due to the interpolation or the zero filling.Alternative methods to exploit the information of all the available subjects were proposed.A possible approach is the complementation of incomplete data representation, which consists in extracting a latent representation from both the complete and incomplete modalities, avoiding the need for data imputation [35,36].Most recent approaches aim at generating missing data either in the input space [37] or in an intermediate latent representation [38,39] relying on generative models such as Generative Adversarial Networks (GANs) [40] and their variants, or Variational Autoencoders [41].This last approach has been successfully applied to the AD continuum investigation.It allows the exploitation of the availability of multimodal data for capturing the relationship among different data sources in the latent space, enabling the generation of one modality from another.Generative models, hence, appear as the route to be followed when dealing with missing data.Interestingly, the recently proposed Cycle-consistent GANs (Cycle-GANs) [42] have shown impressive results in various knowledge translation tasks since they offer a flexible and effective approach for learning mappings across different domains without relying on paired data.
Moving to model interpretability, it is well established that with increasing model complexity, interpretability decreases drastically.However, in order to better understand the mechanisms that underlie the AD continuum and allow the models to be applied in clinical and real-life applications, it is vital to understand the reasons behind a certain output.In this case, Explainable Artificial Intelligence (XAI), becomes essential to understand why a given model made a certain prediction.XAI encompasses post-hoc methods that allow the assignment of an importance score to each feature, reflecting its role in the classification task.This means finally opening the 'black box' of complex models hopefully increasing their exploitability in clinical practice.XAI is starting to be applied in multimodal frameworks to study neurodegenerative and psychiatric diseases.Few works could be found adopting simple gradient-based or feature perturbation methods [43,44,45].On top of this, another overlooked yet stringent aspect is the strong validation of XAI methods and the obtained explanations.The evaluation of explanation methods is still under-investigated, however, since explainability is meant to increase confidence in AI, it is vital to systematically analyze the obtained results referring also to the prior knowledge derived from the state-of-the-art.
In this rapidly evolving landscape, where multimodality holds a central role, we aim at shading light on the importance of the complementary information that could be derived from advanced biomarkers such as brain FC and SNPs mutations, together with the well-established brain atrophy measures derived from sMRI in detecting AD-related modulations.With this purpose, we will present a novel multimodal framework for AD detection and MCI conversion prediction which addresses (i) heterogeneous data integration (ii) missing data management, and (iii) interpretability.Regarding the former, our framework allows the integration of heterogeneous data featuring different dimensionality, e.g., 3D volumes and vectorized data, and nature, meaning from multi-domain, e.g., imaging and genetics.Concerning the second, we propose an approach that allows generating missing modalities in the latent space obtained after input feature reduction, ensuring high accuracy in reconstructing latent features while minimizing computational demands.The goal is to define a framework that can be generalized to any missing modality, without requiring to have at least one specific modality shared by all the subjects in the considered population.Finally, we aimed at emphasizing the interpretability of the proposed framework in order to reinforce its transparency and reliability by conducting a post-hoc interpretability analysis, supplemented by a robust validation step, which enables to precisely discern the contribution of each input feature to the classification of subjects in the AD continuum.

Dataset
Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (https://adni.loni.usc.edu/).The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD.The primary goal of ADNI has been to test whether serial MRI, PET,  other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD.For up-to-date information, see www.adni-info.org.
An accurate and extensive subject selection was performed to retain, for each subject, the baseline timepoint for all the selected imaging modalities, that is sMRI and rs-fMRI.The respective genetic variants (SNPs) were then selected, if available.The final study cohort included 1911 subjects, divided into healthy controls (CN), AD, MCI non-converter (MCInc), and MCI converter (MCIc).In detail, following the classification available on the ADNI website, an MCI subject was considered as MCIc if dementia was diagnosed at any timepoint after MCI diagnosis.
Table 1 shows the AD and CN study cohorts demographic information highlighting the subjects sharing modalities, in particular, sMRI-fMRI and sMRI-SNPs.Table 2 details the MCI study cohort highlighting the percentage of missing data for each considered modality.
3D T1-w MRI and rs-fMRI acquisitions were considered as imaging input channels and were acquired with the following sequence parameters: 1) sagittal accelerated MPRAGE, TR/TE = shortest, TI = 900 ms, flip angle = 9 .More details about the data acquisition can be found in [46].Concerning the genetic data, DNA samples were genotyped using Illumina Human610-Quad or Illumina HumanOmniExpress BeadChip.

Preprocessing and feature engineering
The sMRI volumes preprocessing included tissue segmentation in Gray Matter (GM), White matter and Cerebrospinal fluid (CSF) using the modulated normalization algorithm.Only the GM volume was considered for this study, to which a smoothing using a Gaussian kernel (FWHM = 6mm) was applied.Quality control (QC) included discarding images that exhibited low correlation with individual and/or group level masks.The full preprocessed GM volume was used as input for the sMRI channel resulting in an input size of 121 × 145 × 121 for each subject.
The rs-fMRI data was preprocessed using the statistical parametric mapping toolbox (SPM12, http://www.fil.ion.ucl.ac.uk/spm/) including rigid body motion correction to correct subject head motion, slice-timing correction, warping to the standard MNI space using the EPI template, resampling to (3mm) 3 isotropic voxels, and smoothing using a Gaussian kernel (FWHM = 6mm), following the preprocessing proposed in [47].QC procedure was the same as for sMRI.Fifty-three independent components (ICs) covering the whole brain were extracted using spatially constrained ICA with the Neuromark_fMRI_1.0 template (available in the GIFT software; http://trendscenter. org/software/gift).For each subject, a correlation matrix was then created computing the Pearson correlation between the ICs time courses, resulting in a 53x53 static functional network connectivity (sFNC) matrix.This was divided in 7 RSNs, named: (i) Sub-cortical network (SC); (ii) Auditory network; (iii) Sensorimotor network (SM); (iv) Visual network (VI); (v) Cognitive-control network (CC); (vi) Default-mode network (DM); and, (vii) Cerebellar network (CB) (please, refer to Supplementary Table 2 for more information).The upper triangular matrix was then vectorized resulting in an input vector of 1378 features for each subject.
Moving to the genomics data, pre-imputation QC was performed to remove Single Nucleotide Polymorphisms (SNPs) with minor allele frequency (MAF) < 0.05, call rate < 0.98, and Hardy Weinberg Equilibrium < 10 −3 (details in https://www.synapse.org/#!Synapse:syn2290704/wiki/64710).Imputation was performed with the same reference panel.Only the SNPs with imputation r 2 > 0.4 were included.Linkage disequilibrium (LD)-based pruning with r 2 = 0.8 in a window of 50kb was applied, yielding 445838 SNPs for further analyses [48].A feature selection leveraging on the genome-wide association study (GWAS) on AD conducted by [17] was carried on to include all the relevant SNPs having GWAS p-values less than 1e −04 .A total of 565 polymorphisms was selected and a value between 0 and 2 was assigned based on the presence of mutated alleles.In detail defining wild and mutated alleles respectively as W and M a score of 0 means wild homozygous alleles (W/W), a score of 1 indicates heterozygous alleles (W/M) and a score of 2 defines mutated homozygous alleles (M/M).

Classification tasks
Following the aim of this study, the proposed architecture was devised with a twofold classification aim: (1) AD detection, that is the differentiation of AD and CN subjects, also referred as Task 1 and (2) MCI conversion prediction, that is the stratification of MCIc and MCInc patients, also referred as Task 2. In detail, the network was trained and tested on Task 1 and subsequently used to solve Task 2 allowing to assess its ability in discriminating different stages of disease and also in capturing and highlighting shared patterns across the different categories.

Framework architectures
The proposed framework builds on our previous work [30] and is shown in Figure 1.In detail, the architecture for disease detection consists of three modules: i) Feature reduction module, which performs a CNN-based feature extraction to derive a lower dimensionality latent space, separately for each input channel ii) Generative module that, in the eventuality of missing modalities, actuates a generative process in the latent space transferring the knowledge from one domain to another; iii) Data fusion & classification module that fuses the latent features obtained for the three modalities and then performs the classification.Post-hoc interpretability analysis was then carried on in order to retrieve feature contributions to the classification task.The three modules will be detailed in the following paragraphs.
Feature reduction module The feature reduction module consists of three different CNNs, one for each input channel, leading to a latent low dimensionality representation consisting of 100 latent features for each channel.The sMRI channel was analyzed through a 3D CNN defined by three successive convolutional blocks, each composed of two consecutive convolutional layers (filter sizes of 3 × 3 × 3 for the first four convolutional layers and 2 × 2 × 2 for the last two, number of filters: 64, 64, 64, 128, 128, and 128, respectively, for the six convolutional layers) and one max-pooling layer, followed by four fully-connected layers (FCLs) (number of nodes: 1536, 768, 384, and 192, respectively).The rs-fMRI channel was analyzed through a 1D CNN, composed by four convolutional layers (filter sizes 5 × 5 × 5, number of filters: 64, 128, 128, and 128, respectively), two max-pooling layers and two FCLs (number of nodes: 384 and 100, respectively).Finally, for genetic data, a 1D CNN was employed, consisting of three convolutional blocks, each including a sequence of convolutional layers (filter sizes 3 × 3 × 3, number of filters: 64, 64, and 128, respectively, for the three convolutional layers), batch-normalization and max-pooling, followed by three FCLs (number of nodes: 1024, 512, and 128, respectively).The ReLU activation function [49] was used for all the layers of these three CNNs.

Generative module
The generative module allows imputing the missing modalities in the latent space given the others, when necessary.This task is possible thanks to the injection in the complete framework of four pre-trained generators derived from two different cGANs [42].The first, the sMRI-SNPs-cGAN allows imputing the latent genetics features transferring the knowledge from the latent sMRI one and vice-versa, and the second, the sMRI-fMRI-cGAN, transfers the knowledge again from the latent sMRI features to generate the respective rs-fMRI ones, and vice-versa.An architecture fMRI-SNPs-cGAN was not developed since the number of subjects that shared both modalities was not sufficient.
The cGANs were built and trained prior to the full framework.In detail, to obtain the pretrained generators, the first step required to obtain the channels' latent representations (i.e., the 100 features vector for each modality) needed to subsequently train the cGANs.This was necessary since, as already highlighted, the proposed framework performs the Resonance Imaging (sMRI), functional MRI (fMRI) and Single Nucleotide Polymorphisms (SNPs), the framework is articulated in three different modules: i) A feature reduction module where the input channels are transformed in their low dimensionality latent representations; ii) A generative module where, if necessary, the missing modalities are imputed transferring the knowledge from one domain to another; iii) A data fusion & classification module where the latent features are fused together and then classified using a Multilayer Perceptron.Finally, post-hoc interpretability analysis is performed relying on Integrated Gradients (IG) method for feature attribution derivation.data generation directly in the latent space and not in the original input space.Since the subset of subjects having all three modalities was not numerous enough to train a classification model, two separate bi-modal models were developed, one having sMRI and SNPs (sMRI-SNPs-NN) as input channels and the other having sMRI and rs-fMRI as input channels (sMRI-fMRI-NN).Fig. 2a shows the schematic representation of the sMRI-fMRI-NN, as a reference example.In order to obtain the same latent features as the complete framework, the feature reduction for each channel and the data fusion & classification modules of these bi-modal models were the same as the ones adopted in the complete framework, and described in the previous paragraph.Once the latent vectors were obtained, they were given as input to two cGANs, namely sMRI-fMRI-cGAN and sMRI-SNPs-cGAN, one for each bi-modal model, and hence for each information transfer, from sMRI to rs-fMRI and from sMRI to SNPs, respectively.Fig. 2b shows the sample sMRI-fMRI-cGAN architecture used to generate sMRI from rs-fMRI and vice-versa.The generators are composed of six FCLs (number of nodes: 256, 512, 512, 1024, 512, and 100, respectively) activated by a ReLU activation function.The discriminators consisted of four FCLs (number of nodes: 256, 128, 64, and 1, respectively) activated by the LeakyRelu function [49], alternated with three dropout layers (dropout probability: 0.3).The same architecture was considered for the sMRI-SNPs-cGAN.

Data fusion & classification module The data fusion & classification module consists of a fusion layer and a classifier.
In this module, the latent features obtained either from the data reduction module or the data generation module (for the subjects with missing modalities) are fused together through vector concatenation, resulting in 300 features.A Multilayer Perceptron (MLP) composed of three FCLs (number of nodes: 150, 75, and 2, respectively) activated by the ReLU function was then used for classification.Softmax was used as activation function for the output layer, allowing to obtain the classification probability associated with the input data.Real Fake  Hyperparameter tuning was performed empirically.The number of layers, channels, nodes, and the filters' size were tuned considering the single-modality architectures for the classification (Task 1) before training the multimodal framework.The weights of the CNNs and classifier, used in both bi-modal models and the final pipeline, were initialed as follows: the 3D CNN for sMRI, the 1D CNN for SNPs, and the classifier were initialized using the Xavier Uniform distribution, while the weights of the 1D CNN for rs-fMRI were initialized using the Xavier Normal distribution.The weights of the generators and discriminators were initialized following a Xavier Normal distribution [50] for both the sMRI-fMRI-cGAN and sMRI-SNPs-cGAN.The weights initialization was independent between Step 1 and Step 2.

Training scheme
Training Step 1: Bi-modal models and cGANs In the training Step 1 the two sMRI-fMRI-NN and sMRI-SNPs-NN bi-modal models were trained for the AD and CN classification (Step 1a, Fig. 3) in order to subsequently extract the latent features to be fed to the two cGANs, sMRI-fMRI-cGAN and sMRI-SNPs-cGAN(Step 1b, Fig. 3).More in-depth, the bi-model models and subsequently the cGANs were trained relying on the AD and CN subjects sharing respectively the sMRI and rs-fMRI acquisition and the sMRI and SNPs acquisition, presented in Table 1.A 10-folds stratified Cross Validation (CV) procedure for a total of 40 epochs, Adam optimizer [51] with a learning rate of 0.0001, and mini-batch technique considering a batch size of 14 and 9 were selected respectively for the training of the sMRI-fMRI-NN and the sMRI-SNPs-NN.Weighted Cross Entropy (CE) was considered as the loss function.Differently from the classical CE, it allows computing a weight guaranteeing higher value to the less numerous class, to address the unbalanced classes Step 1a: training the bi-modal classification model Step 1b: training the CycleGAN issue.The best model in terms of validation accuracy was retained for both the NNs and was used to obtain the 100 latent features for each subject and for each modality.In detail, the sMRI-fMRI-NN and the sMRI-SNPs-NN achieved an average validation accuracy of 0.890 ± 0.03 and 0.864 ± 0.03, and then the best models in terms of accuracy for both bi-modal models were retained for extracting the 100 latent features vectors obtained at the end of each feature reduction module for each subject.
The newly obtained feature set was then used to train the two cGANs.A mini-batch technique, keeping the same subjects in each batch as the bi-modal models, was used for both the sMRI-fMRI-cGAN and the sMRI-SNPs-cGAN training.Adam optimizer with a learning rate of 0.0001 was used to train the generators for a total of 600 epochs.As in [42], the adversarial losses were considered as optimization targets for each cGAN and are represented in Fig. 2c.In detail, considering the sMRI-fMRI-cGAN as reference, given s and f the latent feature vector for the sMRI and rs-fMRI, G f s and G sf the generators to obtain s from f and vice-versa and D s and D f the discriminators to distinguish the s from the generated ŝ and the f from the generated f , respectively, the adversarial loss penalizes the D s and D f errors in discriminating between ŝ = G f s (f ) and s, and f = G sf (s) and f , respectively.The major issue arising when considering this loss solely is that with a large enough capacity, the generators G f s and G sf can map the same set of input images to any random permutation of images in the target domain (latent features in our case) [42].The cycle consistency losses were hence introduced to limit the space of possible mapping functions by also penalizing the difference between f and G sf (G f s (f )), and between s and G f s (G sf (s)), respectively, hence completing the cycle (Fig. 2d).
The four pre-trained generators, two for the sMRI-fMRI-cGAN and two for the sMRI-SNPs-cGAN, were then extracted and injected in the latent space of the proposed multimodal framework for the on-line generation of the missing modalities during the full framework training.

Of note, no data leakage was present between the training Step 1 (bi-modal models and cGANs) and the training
Step 2 (full multimodal framework) since Step 2 involved end-to-end training of subjects with no missing modalities (i.e., using input data for these subjects only) and generative models trained in Step 1b for deriving latent features for subjects with missing modalities only.Of note, the full cohort of subjects was not used to train the bi-modal models and corresponding cGANs (Step 1a and 1b), only relying on those subjects sharing the modalities under analysis (sMRI and rs-fMRI for sMRI-fMRI-NN/cGAN, and sMRI and SNPs for sMRI-SNPs-NN/cGAN).
The cycle consistency loss was used in both sMRI-fMRI-cGAN and sMRI-SNPs-cGAN in order to assess the performance of the generators in imputing missing modalities using the Mean Absolute Error (MAE).

Training
Step 2: Full multimodal framework The full multimodal framework including the three input channels, sMRI, rs-fMRI, and SNPs, was trained for classification Task 1, AD detection, relying on the complete AD and CN data cohort.
Subjects were split into training and testing sets, respectively, including the 80%-20% of the study cohort.The testing set was kept unseen until the last testing phase.A stratified 10-fold CV procedure was applied to the training set.A mini-batch strategy (12 subjects in each batch) was adopted during training.Adam optimizer with a learning rate of 0.0001 was chosen and the model was trained for a total of 70 epochs.Weighted CE was considered as the loss function.During this training phase, the four pre-trained cGAN generators' weights were kept frozen.Indeed, during the backpropagation phase, only the weights of the feature reduction module and the data fusion and classification module were updated.

Testing and performance evaluation
The best model, in terms of validation accuracy, obtained from the CV procedure of training Step 2 was considered for the testing phase for the two classification tasks presented in Section 2.3.The testing set retained from the AD and CN study cohort splitting was used for classification Task 1, AD detection.The full MCI cohort, never considered during the training phase, hence kept completely unseen by the model, was considered as a testing set for the classification Task 2, MCI conversion prediction.The framework was finally evaluated in terms of accuracy (ACC), precision (PRE), recall (REC), F1 score, Area Under Precision Recall Curve (AUPRC), and Area Under the Curve (AUC) on both testing sets for the two classification tasks.
Five independent runs reshuffling the train and test sets were performed for probing the generalization capabilities of the model, and the average test performance over the different runs was retained.All the training and testing described were performed in Python, specifically relying on the Pytorch library [52].

Post-hoc interpretability analysis
The post-hoc interpretability analysis was performed relying on Integrated Gradients (IG) [53], aiming at obtaining attribution maps describing the relevance score associated with each input feature on the different input channels.IG maps were derived for the subjects in the testing set of the best model over the five generalization runs for both the classification tasks (Task 1: AD detection and Task 2: MCI prediction conversion, which coincides with the full MCI cohort).A brief introduction of the IG method and its properties followed by the attribution analysis performed for each input channel will be given in the following sections.

Integrated Gradients (IG)
Integrated Gradients (IG) [53] is an interpretability method allowing to find the relevance of the input features with respect to the prediction of the model without requiring any modification to the network under analysis.It is a Baseline Attribution Method (BAM) [54].To obtain feature attributions, IG computes the path integral of the gradients along the straightline path from a given baseline x ′ to the input x [53] following the equation where F is the model and α is a factor ranging in [0, 1], determining the steps along the straightline path from x to x ′ .This method is gaining high popularity among the post-hoc interpretability methods since it satisfies three important axioms: (i) Sensitivity, for which a null score is given to the input features which do not contribute to the prediction; (ii) Implementation invariance, hence given a certain input, the derived attributions for two functionally equivalent networks are as well equivalent; and (iii) Completeness, which ensures that the attributions approximately add up to the difference between the input and the baseline prediction scores [53].
Following the definition, IG attribution scores can assume both positive and negative values.The higher or lower the value of a given feature is, the higher its contribution to the prediction outcome is, either with a positive or negative effect.
It is evident how the baseline choice impacts on the obtained attribution maps.Some recent studies have investigated this issue in order to raise awareness among the users and show the impact of different baselines on the IG scores obtained for the same input and model [55,56,57].Of note, the main requirement for the baseline is to represent a neutral input for the model under investigation.In other words, it should produce a zero score or, equivalently, around 0.5 prediction probability for both classes resulting from the activation function of the final classification layer.For the proposed framework, a triplet of neutral baselines, one for each input modality, was selected after exhaustive empirical research, defined as follows: (i) 3D matrix of zeros for sMRI; (ii) 1D vector of zeros for rs-fMRI; and (iii) 1D vector of Gaussian noise for SNPs, which allowed on average to obtain random classification scores.

IG feature attribution analysis
IG allows obtaining attribution maps for each input and channel, which lay in the same input space.In our study IG was applied on the testing set of each classification task, resulting in a triplet of attribution maps representing sMRI, rs-fMRI, and SNPs feature attributions lying in the same space as the input data, for each subject and task.The different maps will be referred to as sMRI-IG, fMRI-IG, and SNPs-IG in what follows.

Neuroimaging
In order to retain the most relevant features, an initial sMRI-IG map thresholding was carried on, selecting only the attribution values exceeding in absolute value the 99.5 th percentile of the respective IG values distribution voxelwise.The average sMRI-IG values for 55 cortical and subcortical brain regions based on the Harvard-Oxford atlas [58] were then extracted for all the subjects in the testing sets of both classification tasks.Only the regions where at least 99% of the subjects had average values exceeding the percentile threshold were kept for subsequent analysis.
Moving to the fMRI-IG, only the connections whose attribution scores were in absolute value above the 98 th percentile of the respective IG value distribution connection-wise were considered for further analysis.Different thresholds were chosen between the IG maps due to the different input sizes.
A statistical group-based analysis was then performed considering the attribution maps derived for each class, namely CN, MCInc, MCIc, and AD.Shapiro-Wilk test of normality [59] was initially performed on the sMRI-IG and fMRI-IG derived features, revealing non-normally distributed features.The non-parametric Kruskal-Wallis test [60] was hence performed to check the statistical difference across the four classes considered in each feature derived either from rs-fMRI (most relevant connections resulting from fMRI-IG) or sMRI (most relevant brain regions derived from the sMRI-IG).When significance was found, the pairwise Wilcoxon Rank Sum Test [61] was performed between all the possible couples of the four considered groups (AD-CN, AD-MCIc, AD-MCInc, CN-MCIc, CN-MCInc, MCIc-MCInc) and adjusted for multiple comparisons with Bonferroni correction.All the statistical analyses were performed in R.

Genetics
Only the SNPs with attribution greater than the 65 th percentile of the SNPs-IG distribution, hence those having positive attribution, were kept.The Ensembl Variant Effect Predictor (VEP) tool [62] was used to annotate the selected SNPs in their corresponding genes.Web interface of VEP (https://useast.ensembl.org/Tools/VEP)and default settings were used to run the analysis.Selected settings included finding the co-located known variants and 1000 Genomes global minor allele frequency as frequency data for co-located variants parameters.No filtering was applied to the analysis to not exclude relevant biological data.Enrichment analysis was then performed using the clusterProfiler package in R [63] to derive the most important biological processes in which the annotated genes were represented.A background of 16714 genes was selected by annotating the whole SNPs-set of the GWAS study used for feature selection of the input data.The enrichment results were corrected with the Benjamini-Hochberg procedure.The SNPs annotation, as well as the enrichment analysis, was computed only on the classes of patients, namely the AD, MCInc, and MCIc individuals.
Classification performance and IG attribution analysis results will be firstly presented separately for classification Task 1 and Task 2. An overview of the overall group-based analysis will then follow.Fig. 4 shows the average IG maps obtained for the AD detection task and for each input channel, averaged over the correctly classified subjects per class (test set).The jet colorbar was used to highlight the attribution scores, where red and blue correspond to positive and negative attribution values, respectively.In detail, Fig. 4A shows the average sMRI-IG maps for the correctly classified CN and AD patients, overlaid onto the MNI152 template (1.5 mm).A thresholding was applied in order to visualize only the relevance scores above the 99.5 th percentile of the relevance distribution, followed by a Gaussian smoothing with FWHM = 3 mm for visualization purposes.It is evident that the subcortical regions, in particular the hippocampus, are associated with high IG scores in absolute value, with the highest positive (negative) values for CN (AD).Cortical regions generally showed almost null relevance for both the CN and the AD-derived sMRI-IG maps.Of note, higher absolute values were found for the AD maps compared with the CN ones.
Moving to the fMRI-IG qualitative analysis, Fig. 4B shows the average connectograms for CN and AD subjects, including only the connections above the 98 th percentile of the relevance distribution.The ICs-brain region correspondences are detailed in Supplementary Table 2.
The most relevant RSNs were the primary information processing-related networks (SM and VI) followed by multisensory integration networks (CC and DM), and cerebellum (CB), showing high relevance for both the CN and the AD-derived fMRI-IG.In particular, the 53% of the most relevant connections for the CN group belong to the SM network, as opposed to the 38% found for the AD-derived fMRI-IG maps.On the other hand, the VI was highly involved for AD subjects, with the 28% of relevant connections belonging to this RSN, differently from the CN where only 2 VI ICs resulted as relevant.Of note, a total of 15 ICs were marked as relevant for both the CN and the AD-derived fMRI-IG, but with an opposite sign.
More in detail of the relevant connections between the different ICs, the CN subjects showed positive relevant intra-network connections in the CB and SM RSNs, with particularly high scores for the connections involving the post/paracentral and parietal gyri (ICs 9, 10, 11, 12, 13).Inter-network positively relevant connections were also found between the SM and the CB, and the SM and the CC RSNs.Only two negative IG scores were instead retrieved, related to an intra-network connection in the VI network and to an inter-network connection between CC and DM.
A similar pattern was found for the AD fMRI-IG relevant connections but with generally opposite IG-associated scores.In detail, negative relevance was mainly found for both the intra-and inter-network connections encompassing the SM and CB RSNs, with an intra-connection in SM (ICs 11 − 12) showing the highest negative relevance between the same ICs highlighted with the opposite sign in the CN.On the contrary, positive relevance was recorded for the connections between multiple VI ICs, in particular involving right middle occipital gyrus (IC 21) with cuneus (IC 20), inferior occipital gyrus (IC 23), and middle temporal gyrus (IC 25) as well as the inter-network connections VI-DM and CC-DM, with the latter being the most relevant one.
Finally, concerning the SNPs-IG qualitative analysis, Fig. 4C presents a Manhattan plot including only the SNPs with an attributed positive IG score higher than the 65 th percentile, grouped by chromosome.The y-axis reports the associated IG score.A complementary trend between the IG values associated with the SNPs was found between the two classes.This was particularly evident for Chr 1, 7, 11, 14, 15 and 19.A generally high involvement of the Chr 7, 8, 11, 14, and 19 was present for both the AD and CN, with the majority of SNPs selected in these Chr showing high IG scores, either positively or negatively.Of interest, the most relevant SNPs, with positive IG were found in Chr 4, 7, and 8 for the CN and in Chr 12 and 14 for the AD.On the contrary, the most negatively relevant SNPs for CN were found in Chr 14 and 19, as opposed to AD where they were mainly in Chr 1, 6, and 7.

Task 2: MCI conversion prediction
The MCI prediction task was performed by testing the full MCI cohort on the best model obtained for the AD detection task, for each one of the five independent runs.This allowed to assess its viability in predicting MCI conversion to AD, by correctly stratifying MCIc and MCInc subjects, without being trained for the specific task.This procedure allowed to reach an ACC of 0.711 ± 0.01, a PRE of 0.558 ± 0.03, a REC of 0.610 ± 0.03, 0.581 ± 0.01 in F1 score, 0.470 ± 0.02 in AUPRC,and finally an AUC of 0.755 ± 0.01 for the differentiation between MCIc and MCInc.Regarding the performance retrieved using the best model trained on the classification Task 1, we achieved 0.711 in ACC, 0.612 in PRE, 0.550 in REC, 0.579 in F1 score, 0.505 in AUPRC, and 0.766 in AUC.

Feature relevance
In parallel with the results shown for Task 1, Fig. 5 shows a summary of the IG results obtained for the MCI conversion prediction task.The same visualization techniques presented for Task 1 were applied also for this figure.The maps were averaged over the correctly classified subjects for each class, the jet colorbar was used to highlight the attribution scores, and thresholding was applied to retain only the values above the 99.5 th , 98 th and 65 th percentile of the relevance distribution, respectively for the sMRI-IG, fMRI-IG, and the SNPs-IG.The sMRI-IG maps were smoothed and overlaid  onto the MNI152 template.In detail, Fig. 5A shows the average sMRI-IG maps for the correctly classified MCIc and MCInc patients.Similarly to what was shown for Task 1, the subcortical regions held the highest relevance, with the hippocampus being clearly highlighted as the most important region with positive attributions for MCInc and negative values for MCIc.Cortical regions did not show particular relevance, except for some temporal and occipital areas, which however showed low IG scores.
Moving to the fMRI-IG, Fig. 5B shows the averaged connectograms.In accordance with Task 1, the same five RSNs resulted as the most relevant also for the MCI prediction conversion task, namely the SM, VI, CC, DM and CB.The MCInc fMRI-IG showed more involvement of VI (25% of the relevant ICs), compared with the MCIc one (14% of the relevant ICs).On the contrary, the MCIc IG connectome showed a higher number of ICs belonging to the CC network (4 ICs) compared to the two found for the MCInc one.Of note, the two connectograms had 11 common connections with opposite trends.
More in details of the relevant connections between the different ICs, of interest, the most relevant connections between the ICs in the SM were exactly the same for MCIc and MCInc, with opposite signs, and mainly involved the post/paracentral and parietal gyri (ICs 9, 10, 11, 12, 13), as for Task 1.Moreover, the fMRI-IG for the MCInc subjects showed positive relevance also for the intra-network connections in the CB, and for the inter-network connections involving CB-SM and CC-SM.On the other hand, negative relevance was found for the intra-network connections in the VI, mainly involving occipital, lingual, calcarine, middle temporal gyri, and cuneus areas (ICs 17 − 24, 18 − 20, 21 − 25, 23 − 24), and for the inter-network connections between VI and DM, and CC and DM.A slightly different pattern was found for the MCIc fMRI-IG, whose relevant connections generally had a negative IG-associated score.
In detail, intra-network connections with negative scores were found for SM and CB, as already presented, while inter-network negative connections were depicted for VI-CC, VI-DM, and SM-DM.Of interest, negative connections between CC and SM were also retrieved, never reported for the other classes in the study.Positive attribution was found only for an intra-network connection in the VI (ICs 23 − 24) and a connection between CC and DM (ICs 28 − 45).
Finally, concerning the SNPs-IG, Fig. 5C presents a Manhattan plot highlighting the SNPs with an attributed positive IG score higher than the 65 th percentile.As expected and as already seen for the other modalities, a complementary trend between the IG values associated with the SNPs was found for the two classes, with positive weights being associated with the MCInc and negative weights associated with the MCIc for the same SNPs and vice-versa.This was particularly evident for Chr

Group-based statistical IG analysis
The IG scores obtained for the four groups of subjects in the study were analyzed together in order to quantify the differences and similarities outlined in the qualitative analysis.In what follows, the statistical analyses will be presented separately for the neuroimaging (sMRI and rs-fMRI) and the genetic channels.
Neuroimaging channels Fig. 6A shows the boxplots representing the distribution of the average IG values for each subject in the most relevant cortical and subcortical brain regions (ROIs).Of note, only those regions where at least the 99% of the subjects had an IG value were selected for these analyses.The complete acronym list is reported in Supplementary Table 1.A general agreement between the CN and the MCInc subjects, as well as between the AD and the MCIc patients is evident in all the considered ROIs, with the AD/MCIc subjects showing a generally higher variance.As expected from the qualitative analysis, the Hipp and the Amy resulted as the highest relevant ROIs, as well as the ones showing the highest distance between the groups, with high positive attribution for the CN and the MCInc, and strong negative attribution for the AD and MCIc.Among the other subcortical ROIs, the Acc showed a notable relevance for all the groups, with the MCInc and CN having negative scores and the MCIc and AD positive ones.Cau and Put had an associated high positive relevance for AD and MCIc, while almost null scores for the other two groups.
Interestingly, among the cortical regions, the TOF was the most relevant region for both the MCIc and AD (positive IG scores) and the MCInc and CN (negative IG scores).The highest distance between the group IG scores distributions was found for PhGa, followed by the TFCp with the AD and MCIc having high negative scores.On the contrary, the OFG, the POpC, and the PaGp showed high positive relevance for AD and MCIc patients associated with negative scores for the CN and MCInc.
In order to evaluate whether these differences were significant, a Kruskal-Wallis test was performed separately for each ROI, considering the four groups together.The significant ROIs (all except the Ins and the TP) were further analyzed to investigate the group-related differences through the Wilcoxon Rank Sum test.Results are shown in Fig. 6B.The heatmap reports the Bonferroni corrected p-values (6 pairwise comparisons) in logarithmic scale and reverse hot colorbar (yellow/red correspond to higher/lower p-value, white stands for the non-significant comparisons).The most significant differences were found between MCIc and MCInc in the subcortical ROIs, resulting as generally significant (except for Put and Pall) with the lowest p-values being recorded for Hipp and Amy.Moreover, significant differences were also recorded for the temporal (PcC, PhGa, PaGp, TFCp) and occipital (TOF, OFG, POpC) cortical regions, with the most significant difference being recorded for PcC and PhGa.A coherent significance pattern was found between the contrasts MCIc-CN, MCInc-AD, and the CN-AD with generally higher p-values.Of interest, no statistically significant differences were found in the sMRI-IG scores distributions between AD and MCIc patients, except for PcC, TFCp, and POpC, while some significant differences were found between CN and MCInc but with relatively high p-values.
The same statistical analysis was carried out for fMRI-IG, considering all the most relevant connections resulting from the fMRI-IG scores for all the considered classes.The connections between IC18 (VI) -IC14 (SM) and IC44 (DM) -IC17 (VI) did not reach significance at the Kruskal-Wallis test, therefore they were excluded from the post-hoc analysis.The Bonferroni-corrected pairwise tests (Wilcoxon rank sum) are reported in Fig. 6C, separately for intra-and   inter-network connections, with the p-values being represented in logarithmic scale.The related boxplots are reported in Supplementary Fig. 1.Opposed to the sMRI-IG significance pattern, the most significant differences were found between the CN-AD and MCInc-AD contrasts, which showed significant differences for almost all the connections considered.In particular, for the MCInc-AD contrast, significant intra-network connections were part of the SM, : Enrichment analysis results.The top 15 biological processes are reported for the three classes of patients, namely AD, MCIc, and MCInc (columns).The enrichment analysis was performed over the genes annotated from the SNPs-IG.The colorbar represents the associated p-value, dark red the lowest, while the number of genes involved is reported as the bar height.The * symbol indicates the biological processes statistically significant after p-value correction.
involving connections in the parietal lobe, and in the VI RSNs.Concerning the inter-network connection, significance was found between ICs in the DM, part of the cingulate gyrus (IC47, IC45), and ICs in the CC or VI RSNs, as well as between ICs in the SM, located in the parietal lobe and ICs in the CC or CB RSNs.MCIc-CN and MCInc-MCIc showed the same significant pattern as the CN-AD contrast, showing generally higher p-values.Of interest, as for the sMRI-IG statistical analysis, no significant differences were detected for the contrasts MCIc-AD and MCInc-CN, except for two connections between CB and SM (IC51-IC9 and the IC51-IC12 respectively for the MCInc-CN and the MCIc-AD).
Genetic channel Finally, Figure 7 shows the biological processes derived from enrichment analysis for AD, MCIc and MCInc classes, considering for the gene annotation step only the SNPs having positive attributions higher than the 65 th percentile.Supplementary Tables 3, 4, and 5 show additional information, such as the most frequent genes, the GO index, and the corresponding raw and adjusted p-value for each significant biological process.The most significant biological processes were found for MCIc.Among these were biological processes related to amyloid-beta (Aβ), e.g., the amyloid-beta formation and amyloid-beta metabolic process.Of interest, other biological processes relevant to MCIc were related to the cholesterol and lipoprotein/phospholipid, such as cholesterol efflux, regulation of sterol transport, regulation of cholesterol transport, protein-lipid complex assembly, phospholipid efflux, and plasma lipoprotein particle organization.The AD-related biological processes were principally related to Aβ regulation, such as regulation of metalloendopeptidase activity involved in amyloid precursor protein catabolic process, negative regulation of metalloendopeptidase activity involved in amyloid precursor protein catabolic process, positive regulation of amyloid fibril formation, negative regulation of amyloid precursor protein catabolic process, regulation of aspartictype endopeptidase activity involved in amyloid precursor protein catabolic process, and regulation of amyloid-beta formation (the last three were not statistically significant after correction, but they were considering the raw p-value).The most frequent genes present in these biological processes, for both AD and MCIc, and hence being among the SNPs-IG annotated ones, were APOE, PICALM, SORL1, ABCA7, APOC1, and APOA2.The MCInc-related biological processes showed a major difference from the other two groups.In detail, the biological pathways were mostly related to the regulation of the complement activation or immune response, B cells, and leukocytes.For this group and the obtained biological processes, the most frequent genes were CR1, CR1L, FCER1G, ERCC1, and PLCG2.

Discussion
With this work, we proposed a novel multimodal generative and interpretable method, which holds the potential of addressing missing data management while focusing on model interpretability.We specifically applied this method to the crucial tasks of diagnosing AD and assessed its viability in detecting also MCI conversion to AD.By integrating neuroimaging (sMRI and rs-fMRI) and genetics (SNPs) data through DL-based feature extraction, we introduced a novel multimodal data fusion framework, relying on input data not yet simultaneously investigated in the AD continuum in the current literature.Incomplete data is handled by generating missing modalities in the latent space, obtained after feature reduction.This ensures high accuracy in reconstructing latent features while minimizing computational demands.To accomplish this, we relied on pre-trained generators from two cGANs, one trained to generate sMRI and SNPs data, and the other to generate sMRI and rs-fMRI modalities.The approach is however generalizable to any missing modality.The framework is trained to detect AD in a cohort including CN and AD subjects, while its generalization capabilities were then tested by predicting MCI conversion to AD in an independent study cohort.Furthermore, we meticulously conducted a post-hoc interpretability analysis, supplemented by a robust validation step, consolidating the findings' impact in the field.In this section we will first give a technical discussion focused on the classification accuracies and the missing data management, we will then discuss the interpretability analysis and the obtained results from a more neuroscientific point of view.

Classification performance and missing data management
Focusing on the Task 1, AD detection, our model reached the state-of-the-art performance, obtaining an average testing accuracy of 0.926 ± 0.02, with the best model reaching an accuracy score of 0.964.This achievement is even more noteworthy considering that only 12% of the subjects had complete data across all three modalities.Of note, our dataset was constructed considering the subjects that at least had the sMRI modality, however, our proposed framework showcases the capability to potentially impute all three modalities by transferring knowledge across domains, not being limited to having at least one acquisition for all the subjects.Indeed, the injection of the pre-trained generators, that is the two cGANs, allows both to impute the missing rs-fMRI or SNPs from the sMRI and the vice-versa, meaning that it would allow deriving the missing sMRI from the other two modalities if needed.This allows relaxing the requirement of having at least one acquisition for all the subjects which is one of the strengths of the proposed model compared with the state-of-the-art.
Table 3a shows the performance of the proposed model compared with the state-of-the-art in detecting AD, considering works allowing multimodality and missing data management.The simplest model [33] proposed a DL-based multimodal method based on sMRI, clinics, and genetics, excluding features if missing in more than 70% of subjects and filling the remaining missing data with zeros.Their classification task was different from ours since they included the MCI subjects in a three-class classification task.While they achieved an accuracy of 0.780 for the classification between CN, MCI, and AD, the accuracy dropped to 0.630 when classifying CN from the full patient cohort using only sMRI and genetics.Moreover, the proposed technique has some drawbacks, mainly linked to the biases introduced in the network, as already discussed.
Alternatively, methods to exploit the information of all the available subjects, without explicitly generating the missing modalities were proposed relying on latent representation learning.[35] developed an Auto-Encoder-based multiview missing data completion framework using sMRI and FDG-PET ROI-based features.Their method achieved a classification accuracy of 0.836 for classifying CN versus AD, even with 50% missing PET.Similarly, [64] proposed a GAN with an attention layer to generate missing FDG-PET from available MRI features, resulting in a classification accuracy of 0.914 ± 0.19.While these approaches demonstrated excellent results, their framework was limited in handling arbitrary missing modalities, allowing only the completion of missing PET from available sMRI and FDG-PET features.
Finally, [38] presented a task-induced pyramid and attention generative adversarial network (TPA-GAN) to generate missing FDG-PET data from sMRI, achieving an accuracy of 0.927.However, the model was trained and tested on two independent databases with different acquisition protocols, which on one side allowed to evaluate the generalizability of the model, but on the other could bias the results due to the different data source.Moreover, it still lacked the ability to reconstruct sMRI from FDG-PET data.Despite the aforementioned limitations, they are currently the state-of-the-art performance for multimodal MCI conversion prediction, accounting for the 30% of missing PET data.
To address classification Task 2, we directly tested the model trained for Task 1 on the MCI cohort.Only the 11% of the dataset shared all the modalities, making it a challenging scenario.Nevertheless, our framework achieved an average accuracy of 0.711 ± 0.01 for the independent test sets.Although it did not outperform methods specifically trained on this particular task, it demonstrated competitive results.Other approaches have focused on different input data and missing data management to solve this task.For instance, [34] proposed simple machine learning methods with tabular features and limited imputation techniques, achieving an accuracy of 0.670 to stratify MCI subjects.[37] used a GAN to impute missing PET images from sMRI scans, achieving an accuracy of 0.657 for discriminating MCInc vs MCIc patients.Furthermore, [39] proposed a framework for projecting original features into a latent representation, resulting in an accuracy of 0.743 considering a 51% of missing PET.Finally, [38] applied the approach already discussed also to address the classification Task 2, MCI conversion prediction reaching the best accuracy score of 0.753.
Despite the promising results obtained by generative models for missing data imputation, it is essential to acknowledge their shared limitation: they all relied on having sMRI as a prerequisite to impute PET data, and very few included genetics or rs-fMRI information in their analyses, which have instead been demonstrated as relevant biomarkers for AD [10,15].In contrast, our framework does not necessitate a common modality shared by all subjects.Utilizing two cGANs during the generation phase enables to produce the missing latent rs-fMRI and/or SNPs from sMRI.In addition, this process is applicable bidirectionally, allowing the generation of sMRI latent features from either SNPs or rs-fMRI as well.The versatility of this approach opens the way to its generalization to additional modalities through training distinct cGANs and integrating the resulting generators into the complete classification framework.Furthermore, our model's performance confirms that generative models allow to obtain realistic data and learn nonlinear mappings across the different acquisitions, achieving competitive prediction accuracy also with a substantial proportion of missing modalities.

Interpretability analyses
In our work we adopted a post-hoc interpretability analysis through IG, followed by a group-based statistical analysis.More in detail, for each input modality and each task, relevance maps were extracted and analyzed from a qualitative and quantitative point of view.We recall that being IG baseline-based, the feature relevance is to be considered baseline-dependent.The baseline represents a neutral input to the network, that is an input surrogate subject that for the network could equivalently belong to one class or the other.Hence, the baseline chosen should be validated [55,56,57].In a binary classification, the sign of IG could generally be referred to as an increase in the associated feature, on the contrary, a negative value could be associated with a decrease in a feature value.
Interestingly, among the state-of-the-art methods for multimodal AD detection or MCI conversion prediction, only a few works introduced interpretability analysis.[33] proposed an interpretable analysis masking one feature at a time and measuring the drop in accuracy, uncovering that when considering the three modalities together (namely clinics, sMRI, and genetics) the most relevant features were AD-related clinical biomarkers, while when considering solely sMRI and genetics the most relevant features were brain areas such as the hippocampus and amygdala, with limited relevance of the SNP features.Concerning DL and generative-based frameworks, [64], thanks to the adoption of an attention layer in their generation module, obtained feature importance for the generator and discriminator which allowed to analyze the most relevant sMRI features for the generation of the missing PET, as well as the most relevant PET features for the discriminator.However, such relevance was not widely discussed in the paper and it was only limited to the generation phase and not to the classification.Finally, [39] exploited the weights learned for their latent representation learning to derive the input feature importance, finding relevant regions highly related to AD and MCI diagnosis such as the hippocampus and amygdala.
In our results, the interpretability analysis revealed similarities between the MCIc and the AD, as well as between the MCInc and CN, while highlighting significant differences between the CN and AD, as expected, but also between MCIc and MCInc which are of high interest since they would allow to identify early AD biomarkers.
Beginning with the sMRI data, our findings highlighted that sub-cortical regions carried more informative and relevant information for the final classification compared to cortical areas.The hippocampus resulted as the most critical region for all classes under analysis, alongside the amygdala.In AD and MCIc patients, such regions exhibited negative attribution values, suggesting a decrease in GM probability, as opposed to positive values in CN and MCInc subjects.Of interest, the most significant differences in the relevance assigned to such regions were found between MCIc and MCInc.These findings were in line with the well-known literature findings and established hallmarks of AD.Indeed, the pathological process initially affects the hippocampus and amygdala regions before extending to other nearby structures [65,66].Focusing on MCI, [67] demonstrated that MCIc patients exhibited higher atrophy in the hippocampus and amygdala compared to MCInc and CN subjects, which aligns well with our findings.Regarding the caudate region, [68] and [69] highlighted a negative correlation between caudate and hippocampus volumes in healthy subjects, and, in addition, the second work also showed that patients with AD and non-specified dementia have a larger caudate volume compared to non-dementia subjects.However, this is still debated in literature with other works suggesting that the caudate is susceptible to atrophy, resulting in a reduction of GM of this region in AD patients [70], [71], [72].
The cortical areas had a generally lower relevance to the classification for all the classes under analysis.The parahippocampal gyrus (anterior division) and the occipital fusiform gyrus showed the highest attribution scores, with positive values assigned to CN.This was in line with [73] and [67] findings which revealed smaller volumes in the anterior parahippocampal area in AD, MCIc, and MCInc subjects compared to CN individuals.Additionally, [74] highlighted the importance of the occipital fusiform gyrus in their framework for AD classification.Moving to the other relevant brain regions, the precuneus and fusiform cortices showed a reduction of the GM volume in our results, coherent with what is known from the literature [75], [76], [77].Overall, our findings on sMRI revealed that the relevance scores were sensible to an increase in neurodegeneration in some focal brain areas in AD and MCIc compared with CN and MCInc, which aligns well with the literature findings.
Moving to rs-fMRI attributions, five functional RSNs emerged as highly relevant for the final classification of both tasks, namely SM, DM, CB, VI, and CC.The DM has a central role in information integration and processing, and its involvement in AD is well-known in the current literature, with several studies consistently demonstrating that this is the first RSN to be affected by abnormal protein aggregation [78,14].In our results, we did not retrieve intra-network connections in the DM mode.However, for the MCIc subjects, a negative relevance score, suggesting a decreased FC, was found for inter-network connections between DM and visual/sensorimotor areas (DM-VI, DM-SM), while a positive inter-network connection was highlighted between DM and CC.Conversely, AD subjects revealed a positive relevance for a few connections between the DM and VI and between the DM and the CC, while one negative connection was found between DM and SM.These patterns are not commonly reported in the literature but deserve further investigation since they could represent compensation connectivity patterns.
Among the other RSNs, the SM was the most present and relevant, showing negative relevance scores for AD and MCIc for both the intra-connections, involving post/para central and partial gyri, and inter-connections with the other relevant RSNs.On the contrary, for the CN and, importantly, for the MCInc the SM showed positive relevance, leading to significant differences for both intra-and inter-network connections with the most severe groups (AD and MCIc).The lowest p-values were found when comparing MCInc vs AD as well as CN vs AD over the different SM intra-network connections.Inter-and intra-network SM connections were indeed demonstrated to be affected by AD pathology, generally showing an overall decreased connectivity in the later stages of the disease [79], [80], [78].Moreover, [81] showed that many pyramidal and extrapyramidal motor impairments affect a substantial portion of AD patients, even at an early stage of the disease, and progressively worsen along with cognitive impairment, reflecting a possible decreased connectivity in the SM network.
Interestingly, the VI RSN was involved in the classification of the different AD stages.In detail, positive relevance was assigned to AD subjects in both intra-and inter-connections (VI-DM and VI-CC), while a different pattern was found for MCInc, which showed negative relevance in almost the same ICs.Of note, the MCIc did not show high involvement of the VI intra-connections, while few negative inter-connections were found between VI and the other relevant RSN (SM, CC, and DM).This revealed another difference between AD and its prodromal stages, indicating a decrease in VI connectivity in the MCI stage, particularly evident for the MCInc, which then converts into hyperconnectivity in the most severe stage, full-blown dementia.The damage in VI due to AD was previously shown by [78].In detail, [81] found that subgroups of AD patients have concomitant eye diseases, and some visual functions are impaired, which could be caused by impairments in the VI RSN.Moreover, recent studies demonstrated a hyperconnectivity pattern in the most severe stages of the disease present in the VI network [82].Hence, further investigation would be needed for studying the involvement of VI RSN in the AD continuum.
Finally, the inter-network connections between the CB and the SM resulted negatively relevant, hence affected by the disease with a decreased connectivity in both AD and MCIc, while the same was not recorded for the MCInc, unraveling a possible impact on this RSN in a later stage compared to the others.
Moving to the genetic relevance, the significant biological processes for MCInc, MCIc, and AD were derived from the genes annotated starting from the most significant SNPs resulting from the interpretability analysis on our two tasks.Biological processes related to Aβ were found both in AD and MCIc, and they were related to the regulation and formation of amyloid-beta structures and amyloid fibrillar, such as amyloid-beta formation, amyloid-beta metabolic process, and regulation of amyloid-beta formation, and positive regulation of amyloid fibril formation, which are biological processes related to AD [83], [84], [85], [86].Moreover, particularly in MCIc, we found biological processes related to cholesterol, like regulation of cholesterol transport, regulation of sterol transport, and cholesterol efflux, which, on a deeper analysis, were shown as possibly being associated with the AD continuum.Excess cholesterol deposit in the brain was demonstrated to be related to an increase of amyloid-beta plaques and amyloid cascade leading to synaptic plasticity annihilation, promotion of tau phosphorylation, hence contributing to the risk and pathogenesis of AD, possibly in an early phase [87,88].Regarding the particular genes, APOE, PICALM, APOA2, APOC1, ABCA7, and SORL1 were the most frequent in the biological processes.Of note, they all have a relevant impact in the development of AD [45,89].In particular, APOE is notably the major risk factor of AD [90] and is mainly expressed in both the brain and the liver.More in depth, the ApolipoproteinE is a ligand receptor-mediated endocytosis of lipoprotein particles [90] and is the major cholesterol transport and other lipids in the brain [91], [92].This gene is hence one of the responsible of the most important biomarkers for AD, namely amyloid-beta plaques containing amyloid-β peptides and the neurofibrillary tangles containing hyperphosphorylated tau proteins [93].ABCA7 is one of the most important risk genes for AD [27] that mainly regulates the processes related to cholesterol and the processing and clearance of Aβ proteins [94].ABCA7-expressed proteins have a relevant role in the control of cholesterol metabolism, then the cholesterol has a strong influence in the regulation of Aβ synthesis [94,95].Additionally, some studies reported also that different variants of ABCA7 are associated with an increase of Aβ deposition in MCI patients rather than AD [96], similar to what we found in our results, where the ABCA7 gene was involved in MCIc biological processes (Supplementary Table 4).Moving to the other relevant genes, PICALM is involved in the production, modulation, and clearance of Aβ complexes [23].Also, for this gene, interactions with APOE have been demonstrated [24].Another important gene related to AD is SORL1.Some studies found associations between AD and this gene [97,98].SORL1 is primarily involved in generating Aβ.In detail, a reduction of SORL1 protein in brain tissue increases the production of Aβ protein [97,99].Finally, [24] and [26] highlighted also CR1 as relevant factor related to AD pathogenesis.

Main contributions and outcomes
We proposed an interpretable and generative framework for multimodal AD detection and MCI conversion prediction.The main contributions of this work are: (i) reaching the state-of-the-art of multimodal and generative models in the classification of CN vs AD; (ii) reaching competitive performance in the classification of MCI cohort using a pretrained framework; (iii) managing missing data introducing a generation module in the latent space allowing to impute missing modalities in a lower space with respect to the input space, and without requiring the presence of at least one modality for all the subjects; and (iv) proposing an interpretability analysis based on IG, which to the best of our knowledge was never applied for studying the neurodegenerative diseases.
The complementary analysis of three different input modalities allowed to uncover the disease signatures at multiple levels of analysis and contemporary present.We were indeed able to confirm the presence of atrophy, particularly involving the hippocampus in later stages (MCIc and AD), together with FNC impairment, particularly in the information processing-related RSN (SM and VI).The former showed negative relevance scores, hence suggesting a decreased inter-and intra-connectivity in both the MCIc and the AD stages, while the latter, the VI, showed negative relevance in the MCInc stage, being poorly relevant in the MCIc while being positively attributed in the AD subjects, suggesting a compensation mechanism leading to a hyperconnectivity in such RSN in the later stages of the disease.Such brain modulations were present along with the mutation of relevant SNPs linked to the impairment of different biological processes related to amyloid and cholesterol regulation and formation for the MCIc and AD subjects, while mostly related to the complement activation, B cell processes, and immune response pathways for the MCInc.

Limitations and future works
In what follows, the main open issues will be briefly summarized, paving the way for further research.First, concerning the MCI conversion prediction task, in this work, our aim was to test the model generalizability by straightforwardly testing it on the MCI cohort.Training and fine-tuning on the specific task would allow obtaining better classification results and will be the object of future analysis.Moreover, being the model intrinsically multimodal and easily expandable, different channels should be considered, such as clinical information and PET imaging which have been demonstrated to be highly discriminative for the disease [4], as well as advanced structural connectivity metrics derived from diffusion MRI which could help to shed light on early disease signatures.Moreover, it would be interesting to explore different subject classifications, either proposing a multiclass prediction or defining biologically homogeneous groups following the A/T/N (amyloid, tau, neurodegeneration) system which has been attracting increasing attention in recent years [100].Finally, concerning the interpretability analysis, we strongly point toward deeper exploitation of XAI methods to open the way to not yet studied associations or mechanisms that are instead captured by the model.However, we acknowledge that the interpretation of results is not always straightforward, especially when relying on baseline-based XAI approaches, where using not well-chosen baselines strongly impacts on the results.One future aim is, hence, to establish clear rules for the choice of the baseline, enabling the unambiguous interpretation of the attributions as well as the investigation of other post-hoc methods pursuing the robustness and reliability of the interpretation.Moreover, the analysis of the associations between the different input features would be of high impact, eventually exploiting causality models to uncover the temporal sequence of the different biomarkers modulations in the disease continuum.

Conclusion
In this work, we presented a multimodal generative and interpretable DL-based framework for AD detection and MCI conversion prediction.Neuroimaging (structural and functional features) and genetics data were used to address these classification tasks.The generation of missing modalities in the latent space using four pre-trained generators of two different cGANs allowed to obtain competitive classification performance in both tasks.The application of an interpretability method yielded our model to be interpretable extracting the relevance of each input feature and revealing the most important ones for each class, highlighting disease structural, functional, and genetic signatures and opening the way to further analyses.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figure 1 :
Figure 1: Schematic representation of the proposed framework.Given three input modalities, structural Magnetic Resonance Imaging (sMRI), functional MRI (fMRI) and Single Nucleotide Polymorphisms (SNPs), the framework is articulated in three different modules: i) A feature reduction module where the input channels are transformed in their low dimensionality latent representations; ii) A generative module where, if necessary, the missing modalities are imputed transferring the knowledge from one domain to another; iii) A data fusion & classification module where the latent features are fused together and then classified using a Multilayer Perceptron.Finally, post-hoc interpretability analysis is performed relying on Integrated Gradients (IG) method for feature attribution derivation.

Figure 2 :
Figure 2: Schematic representation of a sample bi-modal framework (sMRI-fMRI-NN) followed by the respective cGAN (sMRI-fMRI-cGAN).(a) The sMRI-fMRI-NN is composed of two different modules: feature reduction module, equal to the respective module for each channel in the full framework, and data fusion & classification module where the latent features are fused together and then classified.(b) After training, the latent features of both modalities obtained from the feature reduction module were extracted for each subject and given as input for the training of the associated cGAN, whose loss is described in (c) and (d).

Figure 3
Figure 3 illustrates the training workflow of the entire framework.As already specified, the training phase was not performed end-to-end but was divided into two steps, necessary for the correct training of the generation module: (i) Step 1: Training of bi-modal models and cGANs; (ii) Step 2: Training of the full multimodal framework.

Figure 3 :
Figure 3: Framework training scheme.The training was performed in two steps: Step 1 involved the training of two bi-modal models (sMRI-fMRI-NN and sMRI-SNPs-NN, Step 1a) and the respective cGANs (sMRI-fMRI-cGAN and sMRI-SNPs-cGAN, Step 1b); Step 2 was needed for the training of the complete multimodal framework.

4. 1
Task 1: AD detectionThe generators for both sMRI-fMRI-cGAN and sMRI-SNPs-cGAN models were tested by computing the MAE across the validation sets to compare the true latent space and the reconstructed one.For the sMRI-fMRI-cGAN, the generators achieved a MAE of 0.065 ± 0.01 and 0.074 ± 0.01 for the fMRI and the sMRI modalities, respectively.Meanwhile, the sMRI-SNPs-cGAN generators reached a MAE of 0.506 ± 0.04 and 0.333 ± 0.10 for SNPs and sMRI data, respectively.The proposed framework reached 0.926 ± 0.02 in ACC, 0.910 ± 0.05 in PRE, 0.876 ± 0.03 in REC, 0.891 ± 0.03 in F1 score, 0.829 ± 0.03 in AUPRC, and 0.960 ± 0.01 in AUC for the differentiation between AD and CN subjects.Of note, the best model over the five generalization runs reached 0.964 in ACC, 0.984 in PRE, 0.909 in REC, 0.945 in F1, 0.876 in AUPRC, and 0.970 in AUC.4.1.1Feature relevanceA.

Figure 4 :
Figure 4: Overview of IG attribution maps for the classification Task 1, AD detection.All the IG maps are presented for the correctly classified CN and AD subjects in the testing set. A. Average sMRI-IG maps, thresholded to retain IG scores exceeding the 99.5 th percentile and overlaid to MNI152 template; B. Average fMRI-IG derived connectograms, thresholded to retain the connections with an associated IG score over the 98 th percentile; C. Average SNPs-IG scores highlighting the SNPs with an associated positive IG score exceeding the 65 th percentile, SNPs are grouped by chromosome.

Figure 5 :
Figure 5: Overview of IG attribution maps for the classification Task 2, MCI conversion prediction.All the IG maps are presented for the correctly classified MCIc and MCInc subjects in the testing set. A. Average sMRI-IG maps, thresholded to retain IG scores exceeding the 99.5 th percentile overlaid to MNI152 template; B. Average fMRI-IG derived connectograms, thresholded to retain the connection with an associated IG score over the 98 th percentile; C. Average SNPs-IG scores highlighting the SNPs with an associated positive IG score exceeding the 65 th percentile, grouped by chromosome.

Figure 6 :
Figure 6: Overview of the neuroimaging statistical analysis.A. Boxplots showing the CN, AD, MCInc, and MCIc correctly classified subjects distribution in the most relevant cortical and subcortical brain regions, resulting from the sMRI-IG; B. sMRI-IG Wilcoxon test results for each group comparison and each brain region, divided into cortical and subcortical regions; C. fMRI-IG Wilcoxon test results for each group comparison for each relevant brain connection divided into extra-(upper) and intra-network (lower) connectivity.The p-values for B. and C. are reported in negative logarithmic scale, dark red the most significant.White cells represent no statistical significance.

Figure 7
Figure7: Enrichment analysis results.The top 15 biological processes are reported for the three classes of patients, namely AD, MCIc, and MCInc (columns).The enrichment analysis was performed over the genes annotated from the SNPs-IG.The colorbar represents the associated p-value, dark red the lowest, while the number of genes involved is reported as the bar height.The * symbol indicates the biological processes statistically significant after p-value correction.

Table 2 :
Demographic information and missing data percentage of the MCI study cohort.MMSE = Mini-Mental State

training of both bi-modal models and CycleGANs Step 2: training the multi-modal model
1, 2, 7, 14, 15 and 17.A generally high involvement of the Chr 11 and 17 was present for both the MCIc and MCInc, with the majority of SNPs showing high IG scores.Of interest, the most relevant SNPs, with positive IG were found in Chr 17 for the MCInc and in Chr 15 for the MCIc.On the contrary, the most negatively relevant SNPs for MCInc were found in Chr 14, 15, and 17, as opposed to the MCIc where they were mainly in Chr 1, 2, and 7.

Table 3 :
Comparison of the proposed model with other state-of-the-art methods dealing with missing modalities for AD detection and MCI conversion tasks.(a)AD detection task.Accuracy (ACC), precision (PRE), and recall (REC) metrics are reported on the test set or averaged during the cross-validation phase (mean ± std).Yes * CN vs PAT (AD+MCI) classification relying only on sMRI and SNPs, † Three classes classification, CN vs MCI vs AD relying on the three modalities